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Progress in lattice algorithms 

Mike Peardon a 

a School of Mathematics, Trinity College, Dublin, Dublin 2, Ireland. 

The development of Monte Carlo algorithms for generating gauge field configurations with dynamical fermions 
and methods for extracting the most information from ensembles are summarised. 



1. INTRODUCTION 

This conference illustrated that most large col- 
laborations actively investigating QCD physics 
are performing simulations of the theory with 
dynamical quark fields. Dynamical staggered 
fermion simulations have been pursued for many 
years now, however it is clear that the commu- 
nity is now also firmly in the era of full Wilson 
fermion studies. These calculations will address 
one of the biggest systematic error in many non- 
perturbative QCD predictions from the lattice; 
use of the "quenched approximation" . At present, 
most simulations are not being performed at re- 
alistic values of the quark masses, or with the 
correct number of flavours of light fermions. A 
technically challenging extrapolation of data to 
the up and down quark masses must subsequently 
be performed. 

At the conference, statements were given 
by the biggest collaborations. Estimated costs 
for reproducing the state-of-the-art in quenched 
spectrum data in the full theory were in the range 
10-100 Tflop-years. In this review, efforts to de- 
velop better algorithms to simulate closer to the 
physical parameters of QCD are discussed. 
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In this section, the main features of the tech- 
niques for simulating dynamical Wilson fermions 
are summarised. The simulations of dynamical 
staggered fermions are more mature; the principle 
difficulty at present is one of controlling the choice 
of discretisation M . The new challenge is to find 



efficient methods for including recently proposed 
improved discretisations. Developing algorithms 
incorporating "fat-link" gauge fields, designed to 
enhance the flavour symmetry of the action are 
presented later. It is worth noting that the exact 
algorithms developed for odd-flavour simulations 
with Wilson fermions are being applied to stag- 
gered fermion simulations. 

The Hybrid Monte Carlo (HMC) algorithm @ 
is still the work-horse powering most large Wil- 
son fermion simulations. Over the past few years 
there have been further advances in our under- 
standing of the issue of finite precision arithmetic 
and the reversibility and stability of the molecu- 
lar dynamics (MD) integrator component. HMC 
is an exact algorithm if the MD integrator is re- 
versible and area preserving. It has long been un- 
derstood that the Hamiltonian evolution of QCD 
is chaotic, and the Liapunov exponents were stud- 
ied in detail More recently, the stability of 
the MD leap-frog integrator || has been investi- 
gated in detail by the UKQCD collaboration 
Their findings were presented to the conference by 
Joo Q . Analysis of a leap-frog integrator of the 
simple harmonic oscillator reveals the solution di- 
verges once the step-size exceeds a critical value. 
The problem is exacerbated for higher-order in- 
tegrators. This same behaviour is observed in 
QCD, and the critical step-size is seen to fall as 
the quark mass, m q — > 0. The suggestion is that 
the problem of simulating light Wilson fermions 
with HMC will amount to managing this stability 
problem. 

There has been progress in implementations of 
the local boson algorithm (LBA) ||. This al- 
gorithm relies on first constructing a polynomial 



2 



approximation to the inverse of the fermion ma- 
trix, then expressing the polynomial as a product 
of roots. The determinant is expressed as a lo- 
cal bosonic partition function involving a large 
number (the degree of the polynomial) of auxil- 
iary fields. The beauty of the method is that a 
local change in the gauge fields leads to a local 
computation for the change in the action. Un- 
fortunately, the simplest implementations of this 
method suffer from long autocorrelations Q; a 
large number of auxiliary fields are coupled to the 
gauge fields, and so only small changes in these 
degrees of freedom can be made in each sweep. 
The problem can be circumvented by using lower- 
order polynomials, and correcting for any discrep- 
ancy between the approximation and the true in- 
verse using a noisy accept-reject step Q. This 
idea lead to the development of the two-step 
multi-boson (TSMB) algorithm |0j], where two 
polynomials are constructed, V\{x) (low-order) 
and Vi(x) (high-order) such that 
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(1) 



Vi(x) is replaced by a partition function of a 
small number of boson fields, and T > 2{x) is used 
in the acceptance test. 

Polynomial approximations have been incorpo- 
rated into other techniques. The Polynomial Hy- 
brid Monte Carlo (PHMC) algorithm Q pro- 
posed by Frezzotti and Jansen is one such devel- 
opment. A single auxiliary field, 4> is introduced, 
rather than the large number of LBA, and the 
action is 



S 4> = S G [U]+^V{M^M)4> 



(2) 



This avoids the stiff update dynamics, however 
the action is now non-local, so HMC is used to 
update the gauge fields. The algorithm can be 
made exact either via an accept-reject step or by a 
re- weighting of expectation values by a correction 
factor: 



(C)truc — 



(O dctQ 2 V(Q 2 ))^ 
(detQ 2 7>(Q 2 )) " 



(3) 



Frezzotti and Jansen advocate this re-weighting 
and the break-down of the approximation close 
to zero is now regarded as advantageous. The 



polynomial weight generates more configurations 
with low eigenvalues (which are then assigned a 
lower weight in the ensemble average), and this 
over-sampling should lead to a more reliable de- 
termination of quantities that are sensitive to the 
details of the lowest eigenvalues. 

The ALPHA collaboration have made a com- 



parison 1 13 1 between PHMC and pseudofcrmionic 
HMC. In the range of parameters of interest, they 
find there is a slight advantage to PHMC and 
favour this method for future simulations, on ac- 
count of its over-sampling property. Schroers pre- 
sented results to the conference Q that suggest 
the performance of the tuned LBA is faster than 
the SESAM production HMC code. Performance 
comparisons between TSMB and HMC in com- 
pact QED were presented to the conference by 
Zverev Jig ]. The study finds the two algorithms 
are equally competitive, although the authors em- 
phasise their TSMB implementation can still be 
optimised by changing the local gauge update 
scheme and tuning the polynomials V\ and V2 
more carefully. 

One remarkable feature of these comparisons 
is that the results are, in fact, so close, de For- 
crand emphasises that the LBA algorithm turns 
into a standard local heatbath/over-relaxation al- 
gorithm as the fermions are made heavier, and 
the expectation there is that the over-relaxation 
algorithms should out-perform (global) HMC sig- 
nificantly. 

3. ODD-FLAVOUR SIMULATIONS 

Since QCD has three flavours of light fermions, 
there is a natural incentive to study algorithms 
for simulating an odd number of fermion flavours. 
The standard pseudofermion formulation is un- 
suitable, since the Wilson matrix can have eigen- 
values with negative real parts, meaning there are 
regions of configuration space where the gaussian 
integral is not defined. One extremely useful de- 
velopment arising from the local boson method 
has been the possibility of performing these odd- 
flavour simulations, even maintaining an exact 
update algorithm. Since these simulations still 
rely on gaussian integrals, they in fact generate 
configurations with probability measure defined 
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from the modified partition function, 
Z + = [vU\detM\ e- s °. (4) 



The sign of the determinant can be included in a 
post-hoc reweighting of observables: 

(O sgn(det M)) + 



(Oh 



(5) 



(sgn(detM)) + 

This may lead to a "sign problem" , where statisti- 
cal estimates of reweighted observables fluctuate 
wildly if a mixture of configurations with both 
positive and negative sign determinants occur in 
the ensemble. Empirical studies of this problem 
for new simulations of QCD with three quarks 
(with a target of m q « m s /4) were presented to 
the conference by Gebert Jig ], They expect the 
sign problem to be very mild in this region. 

The local boson algorithm can approximate the 
partition function of Eqn. |J; starting from a poly- 
nomial approximation to lf-y/x in the range [e, 1] 
leads directly to the one-flavour partition func- 
tion. Polynomials of the non-hermitian fermion 
matrix M also present a solution, as suggested 
in Rcf. and developed in Ref. A 
polynomial approximation to M _1 of order 2n is 
constructed, 



M" 1 ss V(M) = H(M-z k ). 



(6) 



k=l 



For a suitably chosen polynomial, the roots, z k 
come in complex-conjugate pairs, so the polyno- 
mial can be written 



V(M) = f[(M-z k )(M-z* k ). 



k=l 



The 75 hermiticity of M means 
det(M -z%) = det(Af * - z* k ) 



(7) 



(8) 



so |detM| can be represented approximately by 
a gaussian integral, 



| det M | « JV(j)Dcf>* exp {-0*T^(M)T„(M)0}(9) 
with 

n 

T n (M) = ~[[(M-z k ). (10) 



The scheme can be made exact with a Metropo- 
lis test; two alternatives were proposed in Refs 
b oth of which rely on a noisy Kennedy- 
Kuti |fl9f acceptance test. The three-flavour sim- 
ulations of QCD are being carried out by the 
JLQCD collaboration and results were presented 
to the conference f2(i(| . 

4. ACCELERATING HMC 

HMC remains the most popular tech- 
nique for generating ensembles of gauge fields 
with two flavours of dynamical Wilson (and 
Shcikholcslami-Wohlcrt improved) fermions. 
Most of the simulations use this well-established 
method, so it is useful to find simple schemes to 
enhance the method. 

4.1. ILU preconditioning 

Replacing the fermion determinant by a gaus- 
sian integral couples extra stochastic degrees of 
freedom to the gauge fields. This in turn leads 
to a lower Metropolis acceptance rate in HMC. It 
has been known for a long time that coupling the 
gauge fields and pseudofermions with the even- 
odd preconditioned fermion matrix (whose deter- 
minant is equal to that of the original fermion ma- 
trix) leads to a higher acceptance rate for a given 
molecular dynamics step-size pj] . The SESAM 
collaboration [^] emphasised that the even-odd 
preconditioning scheme is an example of an in- 
complete LU factorisation technique for matrix 
preconditioning. ILU preconditions a matrix by 
left and right multiplication by two readily invert- 
ible matrices; 



M = (I — £) _1 M (J — U)^ 1 



(11) 



where L and U are the lower and upper sections of 
the matrix. Defining L and U first requires sites 
on the lattice, x are assigned an index, s(x) then 
site ordering is defined as 



x > y if s(x) > s(y). 



(12) 



fe=i 



SESAM developed a site ordering that is well- 
suited to accelerate matrix inversion on parallel 
computers. Since this preconditioning leaves the 
determinant of the matrix invariant, M can be 
used to couple the pseudofermions to the gauge 
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Figure 1. Schwinger model HMC Metropolis ac- 
ceptance probability for pseudofermions coupled 
via different preconditioned matrices E3). 



fields. This holds for any choice of ordering 
function, s(x). In Ref. (^3|, different site or- 
derings were tested in HMC simulations of the 
two-flavour Schwinger model. An unusual pat- 
tern emerged; the optimal ordering for speeding 
up matrix inversion was the global lexicographic 
scheme, while the best ordering for improving 
Metropolis acceptance was the usual even-odd 
preconditioner. In Ref. |p4) , it was noted that 
the even-odd matrix itself can be ILU factorised 
again, and this both accelerates matrix inver- 
sion and improves the Metropolis acceptance rate. 
Ref. jH tested a range of two step (eo-ILU) 
preconditioned matrices (defined now by differ- 
ent choices of an indexing function, s(x e ) on the 
even sub-lattice only). These simulations found 
the same pattern emerged; the optimal ordering 
for inversion is a globally lexicographic one, while 
using a "locally ordered" scheme is best for im- 
proving Metropolis acceptance. Using an eo-ILU 
ordered interaction allows a three times larger 
step-size for a given Metropolis acceptance rate. 
The naive expectation that the autocorrelations 
of the update scheme are independent of the inte- 
grator step-size at a fixed acceptance is supported 
by simulation data. 



4.2. Splitting the pseudofermions 

Hasenbusch |25| recently proposed modifying 
the pseudofermion sector of HMC to improve the 
acceptance rate for a fixed molecular dynamics 
step-size. The key observation is (as above) that 
coupling the gauge fields to pseudofermions via 
a well-conditioned matrix allows a larger molec- 
ular dynamics step-size to be taken for a fixed 
Metropolis acceptance. The fermion determinant 
is first split into two pieces, namely 



dct M = det MM' 1 det M. 



(13) 



The two-flavour determinant is then represented 
by a gaussian integral, 

detA/ 2 = /p[0,0*^,^*]exp{-^-^} (14) 



with 

S 6 = \MM~ 



and S, 



|M~V| S 



(15) 



Hasenbusch then chooses M = I — k A and recog- 
nises that for < k < k, the two matrices MM' 1 
and M are better conditioned than M. This 
leads to improved acceptance. In Ref pBJ , Hasen- 
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Figure 2. Schwinger model simulation cost vs k 
for the split pseudofermion scheme p5j,EG| . 



busch tested the idea in the two-flavour Schwinger 
model and at the conference [p6| , results for pre- 
liminary studies in QCD were presented. The 
cost of HMC simulations (defined in units of ma- 
trix x vector operations) as a function of the 
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splitting parameter, k is summarised in Fig. |2j In 
the Schwinger model study, simulations are accel- 
erated by a factor of 1.7 when the pseudoscalar 
meson mass, rap — 0.210(3), rising to a speed-up 
of 2 when mp = 0.124(5). This increase at lighter 
fermion mass is an encouraging result. The stud- 
ies in QCD [2(J are more exploratory, but effects 
of similar magnitude are seen. 

4.3. Multiple time-scale integration 

Based on the idea of separating the infra-red 
and ultra-violet sectors of the fermion matrix, 
Jim Sexton and I are working on a practical im- 
plementation of a multiple time-scale molecular 
dynamics integrator. A low-order polynomial ap- 
proximation to the fermion matrix inverse, which 
captures the short distance scale effects of quark 
loops is used, along with a pseudofcrmion ac- 
tion to reproduce long-range physics. The force 
on the gauge field from the polynomial term in 
the action is integrated (along with the easy-to- 
compute force from the Yang-Mills discretisation) 
using a finer time-step, and the long-range modes, 
contained in the pseudofermion action are inte- 
grated using a larger time step p?| . 

4.4. Are these schemes compatible? 

If these ideas could be combined, the possibil- 
ity of an order-of-magnitude acceleration of Wil- 
son fermion simulations would open up. ILU 



preconditioned pseudofermions (Sec. 4.1) can be 
used straightfor ward ly in the multiple time-scale 
scheme of Sec. Li. This idea is under inves- 
tigation, and it seems likely the benefits from 
each component will be combined. Hasenbusch 
demonstrated directly that his scheme works with 
even-odd preconditioned pseudofermions. 

5. NEW DEVELOPMENTS 

The Kentucky group (28) are developing a new 
dynamical fermion updater based on a Kennedy- 
Kuti noisy acceptance test on changes in the de- 
terminant. The ratio of the determinants evalu- 
ated on the new and old configurations is com- 
puted using a Zi stochastic estimate of the trace 
of a Pade approximation to the logarithm of the 
fermion matrix. The scheme is still being devel- 
oped. One interesting feature [E9J is the possibil- 



ity of using the algorithm to simulate finite den- 
sity QCD, by including a projection onto states 
with definite particle number. 

Bakeyev ||(| has proposed a new scheme for 
performing exact, finite updates of the gauge 
fields interacting with dynamical fermions. The 
update involves finding a new configuration, U 
that solves 



M(U)rj = ojM(U)rj 



(16) 



for a stochastic parameter, lu and gaussian noise 
source, r\. The idea is in its infancy as yet, and 
may prove to be most useful as an additional step 
in an HMC simulation. 

The flavour-symmetry breaking of staggered 
fermions can be reduced by using a "fat link" 
to induce quark-gluon interactions. Simulating 
these staggered actions poses a new problem in 
algorithm design. The fat link is constructed 
following a similar procedure to APE smearing, 
which involves summing the six staples around 
a link, then projecting the resulting matrix back 
into the gauge group. The fattening step is re- 
peated iteratively. The challenge for simulation 
is that the force term on the underlying gauge 
fields is difficult to compute. New algorithms are 
being developed and tested to circumvent this 
problem. Knechtli |31| presented these ideas to 
the conference. HMC fat-link simulations with 
four flavours when the lightest pion has mass 
rom, = 2.0 are seven times more costly than 
runs where the quarks interact directly with the 
link variables. This factor is reduced to about 
3 once r^m^ reaches 1.6 This extra overhead is 
more than offset by the much better flavour sym- 
metry of the fat-link actions, and the quark mass 
accessible to these studies should be much lighter. 

6. HIERARCHICAL NOISE REDUC- 
TION IN YANG-MILLS THEORIES 



Recently, Liischer and Weisz |32j proposed a 
novel algorithm for reducing the stochastic noise 
in large Wilson loop expectation values by a fac- 
tor that grows exponentially with the loop size. 
The scheme involves building a hierarchy of sub- 
domains within the lattice, and performing a re- 
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Figure 3. The Polyakov loop correlator on a 
12 4 ,/3 = 5.7 lattice, computed using the hierar- 
chical noise reduction method. Statistical errors 
are smaller than the plot symbols |32| . 



duced Monte-Carlo integration over the degrees 
of freedom inside each domain. The results at 
each level are passed up the hierarchy to con- 
struct variance reduced averages of larger and 
larger objects. The scheme is a natural exten- 
sion of the temporal link "multi-hit" method j33| . 
They illustrate their proposal with a calcula- 
tion of the correlator of two spatially-separated 
Polyakov lines (see Fig. |J) , and show a reduction 
of many orders of magnitude in the statistical er- 
ror over the standard technique. 

The scheme can be extended to include mat- 
ter fields. If the fermion determinant is included 
using the LBA, the action is localised within 
a small neighbourhood. With Wilson fermions 
(and the Wilson parameter set to unity) the hi- 
erarchical measurement scheme can be incorpo- 
rated without modification. The difficulty is in- 
cluding the stochastic accept-reject schemes usu- 
ally employed with the LBA to make the method 
exact. These rely on lattice- wide observations, 
involving all the gauge degrees of freedom and 
the domain decompositions of Ref [p2[ would no 
longer apply. 

It remains an unsolved challenge to include 



the highly developed tools for optimising ground- 
state overlap into the hierarchy scheme. Methods 
such as APE smearing and Teper blocking have 
proved crucial in the accurate calculations of the 
inter-quark potential and its excitations in Yang- 
Mills theory. Variational techniques, which build 
a large basis sets of creation operators for the 
states of interest, then find an optimal ground- 
state operator by diagonalisation have also dra- 
matically improved these calculations (for an ex- 
ample, see Ref. Q). While the theoretical as- 
pect of including these ideas is readily resolved, 
it is not obvious how well this toolkit can be used 
within the hierarchy method in practice. An em- 
pirical test seems the only way to resolve this is- 
sue. If the noise reduction of the method of Ref. 
Q can be married to the excellent ground-state 
construction of the smoothing and operator basis 
methods, then extremely precise calculations of 
the detailed nature of the confining string spec- 
trum at large separations could be performed. 

7. ALL-TO-ALL FERMION PROPAGA- 
TORS 

Having expended such a considerable number 
of cycles on some of the world's largest supercom- 
puters in generating an ensemble of gauge field 
configurations, it is clear we are duty bound to 
make the most of the information they contain. 

For many observables, believed to be sensitive 
to vacuum fluctuations in the quark fields, tra- 
ditional point-source propagator methods are in- 
efficient. To extract a useful signal from the en- 
semble, the propagator from all (or many) points 
to all points on the lattice is needed. There has 
been a good deal of interest in improving these 
techniques in recent years. 

7.1. Gaussian and Z2 stochastic estimators 

A gaussian representation of a fermion matrix 
entry can be written [551] : 



V<])V(f>* (j}i{(f)*Q)j exp {-<t>*Q 2 <t>} . (17) 



Thus a propagator from any point on the lattice 
to any other can be estimated by performing a 
sub-Monte Carlo simulation on each configura- 
tion. 
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At the conference, Duncan |36| presented simu- 
lation data using a gaussian all-to-all scheme. He 
emphasised that for many interesting physics ap- 
plications, this simplest of methods is adequate 
to ensure the dominant source of statistical er- 
ror is from fluctuations within the gauge ensem- 
ble, rather than the stochastic estimator. The 
stochastic degrees of freedom can be updated in 
a global heatbath step, but this requires a matrix 
inversion. Ref. j37| gives a scheme to accelerate 
the process; the iterative matrix solver is stopped 
after a small number of steps, and a Metropolis 
test is used to correct for the error in this step. 

The Kentucky group Q noted that other 
stochastic variables can be used to compute ma- 
trix traces, and exploited random elements of Zi. 
These estimators have been employed in a range 
of calculations involving disconnected diagrams. 

7.2. Improving stochastic estimators 

Ref. H^] suggests reducing the noise in a gaus- 
sian estimator by dividing the lattice into two 
segments, then computing the independent sets 
of integrals in each segment exactly in a fixed 
background of the degrees of freedom on the in- 
terface between the segments. Conceptually, the 
new Luscher-Weisz hierarchy method, described 
in Sec. |^ is reminiscent of this scheme. The 
method has one restriction; the source and sink 
points of the propagator can not be in the same 
domain, so closed fermion loops can not be com- 
puted. In Ref. a further enhancement of the 
method is presented. 

At the conference, a method was presented by 
Wilcox jll| to mix gaussian and Zi noise, al- 
lowing an approximate heatbath with Metropolis 
correction to be used. 

The SESAM collaboration [|| improved Z 2 es- 
timators by breaking the vector space into sub- 
spaces (in their study, the 4 sub-spaces corre- 
sponding to the spin indices) and performing 
separate stochastic estimators in each sub-space. 
The idea was tested further in Ref jl3) . While this 
means more matrix inversion must be performed 
for a fixed stochastic ensemble size, the resulting 
estimators can have a significantly smaller vari- 
ance. This idea of thinning can be extended be- 
yond just spin and colour indices to spatial sites 



as well. If the scheme were to be carried to the 
extreme, where the N dimensional vector space 
is decomposed into N 1-dimensional spaces, a Z^ 
estimator would reproduce the exact result for 
any ensemble size. This is not such a surprising 
result, since this computation would require N 
inversions to be performed and it is a trivial re- 
sult that the trace of M~ x can be computed in N 
inversions. Without thinning however the error 
falls off, as any statistical estimate should, like 
l/>/N so eventually thinning must become more 
efficient. 

The thinning procedure has not been explored 
fully for computations of all-to-all propagators in 
QCD, and it would be worthwhile to investigate 
this more completely. 

7.3. Eigenvalue decomposition 

If all the eigenvectors (and eigenvalues) of the 
fermion matrix are known, then any propagator 
entry can be computed straightforwardly. It is 
natural to expect that the long-range physics of 
QCD are contained in the lowest-lying eigenvec- 
tors, and these will then dominate in the spec- 
tral representation of a mesonic correlator (for 
example). The importance of understanding the 
physics of the lowest modes of the fermion matrix 
was emphasised in the plenary talk of Edwards 

Neff presented the conference J45| with an in- 
vestigation of methods for computing the low- 
lying eigenvectors efficiently. The acceleration 
comes from using a polynomial to isolate the 
region of eigenvalues of interest, and this leads 
to more rapid converges of the Arnoldi method. 
At the conference, Schilling [^t|fl7j discussed the 
dominance of the 7r and rj correlators by the low- 
lying eigenmodes. Truncating the spectral expan- 
sion introduces a systematic error, but this can be 
avoided by computing all-to-all propagators in a 
hybrid scheme (as noted by Edwards Q). 

7.4. Hybrid schemes 

The two schemes can be combined straightfor- 
wardly to construct a hybrid all-to-all estimator 
method J44| . A number of low- lying eigenvalues 
of the fermion matrix are computed and incor- 
porated exactly and the effect of all the remain- 
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Figure 4. The disconnected (hairpin) correlator 
for the rj' , computed using both an eigenvalue 
decomposition (TEA) and stochastic estimators 
(SET) @. 

ing eigenvalues is estimated stochastically. This 
involves building a projection operator onto the 
reduced sub-space spanned by the explicitly com- 
puted eigenvectors. 

For illustration, consider the observable 

N 1 

TrQ- 1 r = ^- v*T Vi . (18) 

z 

If the lowest m (m <g; N) eigenvectors of the her- 
mitian matrix Q = 75M are computed, then it is 
natural break the the spectral representation of 
Q into two parts, 

m N 

Q = J2KViV*+ x ^<- ( 19 ) 

i— 1 i— m+1 

Defining the (snb-space) inverses and projectors, 

rn . m 

the trace becomes 

Tr Q- X r = Tr Q (0) T + Tr Q (1) I\ 

The first term is the truncated estimate, and the 
remaining term can be estimated stochastically; 

Tr Q (1) r = (77*rQ (1) ?7) 

= (v^iQ' 1 - Q(p))v) 

= (r,*TQ- l {I-V {0) )v). (21) 



The operation of / — V(p) is an orthogonalisation 
of the noise vector, 77 with respect to all m known 
eigenvectors. The matrix inversion in this last 
step is accelerated, since the eigenvectors corre- 
sponding to the low-lying eigenvalues have been 
removed from r\. It will be interesting to see hy- 
brid schemes tested in practical applications. The 
developments of GMRES algorithms presented to 
the conference [Q look particularly interesting in 
this context. 

8. CONCLUSIONS 

For the next few years, most large-scale simu- 
lations of full QCD will use the Wilson or stag- 
gered fermion formulation, while the chiral lattice 
actions [fl9) are beyond the reach of current com- 
puting resources; Ginsparg- Wilson fermions are a 
factor of 100 times more expensive. As pointed 
out by Karl Jansen, the Hybrid Monte-Carlo algo- 
rithm is not far off its twentieth birthday and re- 
mains (largely unmodified) the most popular tool 
for dynamical Wilson fermion simulations. 

Current predictions Q are that resources avail- 
able in the next few years are an order of mag- 
nitude below the requirements for reliable QCD 
simulations. Including the newest ideas pre- 
sented, it might be hoped to get close to the re- 
quired factor of ten improvement in performance. 

Realistic comparative tests of HMC vs LBA 
have been carried out in recent years, leading to 
the observation that the two methods have identi- 
cal costs (for all practical purposes). These com- 
parisons are always very difficult to make. It 
would be helpful to see a standard benchmark 
emerge on which direct comparisons of algorithms 
can be made. Defining a standard is difficult 
but with this in place, fair races of new methods 
against "thoroughbred" algorithms rather than 
the simplest implementations could be made. 

Excitement at the conference generated by the 
recent idea of Liischer and Weisz suggests re- 
considering the traditional Monte-Carlo analysis 
pathway; generating configurations then subse- 
quently making measurements. Their hierarchy 
scheme, which knits these two processes together 
can lead to orders of magnitude increases in effi- 
ciency. The challenge is to apply this philosophy 
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to a wider range of applications. 

I am grateful to Ph. de Forcrand, C. Gebert, 
M. Hasenbusch, B. Joo, A. Kennedy, J. Kuti, M. 
Liischer, I. Montvay, H. Neff, W. Schroers, J. Sex- 
ton and N. Zverev for many stimulating discus- 
sions, observations and helpful correspondence. 
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